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Rezumat. O data cu cresterea rapida la nivel de utilizare si raspandire a tehnologiilor 
CSCL (Computer Supported Collaborative Learning), nevoia de a evalua automat 
contributia si impactul replicilor si al participantilor a devenit din ce in ce mai acuta. 
Lasand la o parte aspectul ca evaluarea in sine este un proces consumator de timp, 
aceasta devine mai dificila pe masura ce in cadrul discutiei sunt implicafi mai multi 
participanti si concomitent cu o intretesere mai deasa a replicilor. In acest context 
propunem un sistem axat pe evaluarea implicarii si a nivelului de colaborare al 
participantilor in cadrul discutiei. Aditional, o abordare clasica care combina 
prelucrarea limbajului natural cu analiza retelelor sociale s-a demonstrat insuficienta 
pentru a obfine o intelegere profunda a discursului. Astfel, noi propunem un model 
bazat pe dialogism (Bakhtin) care surprinde si utilizeaza intreteserea replicilor pentru 
a evalua colaborarea si nivelul de polifonie al intregului chat. 


Abstract. With the rapid increase of use and spread of Computer Supported 
Collaborative Learning technologies, the need of automatically evaluating the 
contribution and the impact of each participant and utterance has increased 
substantially. Besides being a time consuming process, the evaluation of a discussion is 
even more difficult with the increase in number of participants and with the 
intertwining of utterances. In this context we propose a system devised for evaluating 
the involvement and the degree of collaboration of participants in chat conversations. 
Moreover, a traditional NLP (natural language processing) approach combined with 
Social Network Analysis proved insufficient for obtaining a deep understanding of the 
discourse. Hence, we propose a model based on Bakhtin’s dialogism that captures and 
uses the intertwining of utterances in order to assess collaboration and polyphony 
within a chat conversation. 
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1. Introduction 


Although instant messaging (chat) has been used for several years, the lack of 
sophisticated and accurate ways of analyzing and processing natural language has 
hindered the implementation of automatic analysis and feedback systems. In order 
to process text and conversations, several programs like DIGALO [6], 
CORDTRA [7] and TATIANA [5] were designed. Most of the previous programs 
use contingency or argumentation..graphs.and allow only annotations and 
visualization based on links added manually. 


The program presented in this paper uses a polyphonic thread model [12] and 
provides detailed feedback to the user on both chat and forum discussions. The 
information provided by this system is derived from Bakhtin's dialogism theory 
[1, 2, 9], whereas the entire system is constructed on a CSCL model. Moreover, 
the approach successfully integrated social network analysis with NLP and 
polyphonic analysis. 


2. The fundamental concepts of our system 


The program analyzes chats based on 3 intertwined concepts, namely utterances, 
voices and echo. The core units of any discussion, utterances are segments of the 
discussion that differ in terms of the subject at hand. Separating utterances is a 
difficult process, as they can contain anything from a few words, to pages of text. 
Our analysis is based on Dong's perspective [4] that separates utterances based on 
the remarks of the participants: when a user expresses a different point of view or 
intervention, a new utterance is born. 


Our system analyzes both the coherence and the meaning of each utterance by 
creating an utterance graph based on explicit (added by the participants) and 
implicit links (derived from the actual discussion). In this graph, a node is an 
utterance and the edges are expressed in terms of similarities between them 
projected on the discussion timeline. By using this graph, we are also able to 
determine the way utterances interact via their so-called inner-voices. 


If utterances differ in terms of the subject of discussion, voices differ in terms of 
points of view or positions. A certain utterance may be developed and re- 
discussed several times during a conversation, thus creating a perspective or topic, 
hence a voice. 


Both remarks of a single individual or of a group can constitute a single voice, 
thus creating generalized voices (similar opinions of a larger group of persons), 
internal voices (personal opinions) and external voices (opinions openly stated by 
individuals). Voices can be measured in different ways, regarding both their 
strength and frequency. A common phenomenon is ventriloquism, in which a 
certain voice is reemitted by another. 
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In order to obtain good collaboration between individuals, a conversation must 
become a true polyphony in the sense that multiple different voices harmoniously 
combine and focus on a specific topic. This process also allows us to make a 
psychological analysis of the participants. 


Moreover, a discussion can also be divided into separated parts, each with a main 
voice from which a specific context can arise that merges all similar voices. As 
such, a voice, if repeated during a conversation, can exert influences upon others, 
therefore generating echoes within the discussion. An echo can either influence a 
single individual, thus being an individual echo, or a group of people, thus 
becoming a collective echo. As a conversation evolves and contexts begin to 
form, new voices arise, later to become echoes and further influencing the 
continuation of the discussion. 


If we are to take into account all 3 previous concepts, we notice an effect that was 
both retrospective and synergetic, being based on merging voices from previous 
utterances and their echoes, and another prospective effect that showed how the 
echo of a voice can model the entire further discourse. 


As such, a conclusion can be drawn, namely that users and voice inter-animation 
are the core of collaboration in a conversation. Our system aims to analyze the 
involvement and the inter-animation between users and voices within a 
conversation in a polyphonic manner. The results from a formal validation round 
highlight that the system provides very effective feedback for both teachers and 
students. 


3. The conversation evaluation process 


As mentioned before, the first and main step in the analysis of any discussion, 
weather it is a chat or otherwise, is the creation of the utterance graph, containing 
both explicit utterances, marked specifically by the users, and implicit ones that 
derive from the context of the conversation. In this graph, every utterance is a 
node. The nodes are connected by several edges, each with an assigned trust 
corresponding to the actual method of identification (repetition, co-reference, 
lexical chains) [8, 11]. 


As the chat advances, the orientation of edges corresponds to the timeline of the 
discussion and to the evolution of the conversation in time. The initial links are in 
the opposite direction and connect the current utterance to a previous one to which 
it is liked either implicitly or explicitly. 


The process of evaluating an utterance is a challenging task. Therefore 
quantitative, qualitative and social aspects were considered when grading every 
utterance. These dimensions enabled us to evaluate the impact of a single 
utterance taken individually or within all corresponding discussion threads. 
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The first step in this analysis is to correct an utterance grammatically and 
orthographically, eliminating stop words and counting the remaining characters, 
thus obtaining the quantitative grade. In addition, in order to avoid the 
unnecessary repetition of certain words, the logarithm of the occurrences of each 
word is used instead. 


The next step is to evaluate the similarity between the current utterance and the 
vector of the entire chat which syntheses_the correlation with the overall 
discussion; furthermore, given the vector of a certain set of keywords chosen by 
the mentor as relevant subjects for the discussion, we can evaluate the similarity, 
therefore the coverage of those topics within each utterance. The qualitative mark 
combines these two aspects and provides a deep understanding of the importance 
of each utterance with regards to the entire discussion and to the predefined set of 
keywords. 


Our semantic analysis is based on a vector space model from Latent Semantic 
Analysis [10] and similarity is computed by means of cosine measure. An 
interesting aspect is the visualization of this space, the links between concepts, 
depicted in Fig. 1: 
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Fig. 1. Physical and radial view of the vector space model starting from a given word. 


The final step in the assessment of each utterance represents the social analysis. In 
this step, we analyze the utterance graph using similar metrics to social network 
analysis, although centrality isn't very meaningful for such a conversation. Both 
previous perspectives are re-used here, the quantitative one for the number of 
links and the qualitative one by means of LSA similarity. 
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In our evaluation we currently use in and out-degree to assess the utterance graph, 
but we are planning to add other measure relevant to this specific structure of 
inter-twining utterances. 


Based on the described algorithm, the mathematical formulas we use to grade 
each individual utterance are the following: 


mark(u)= ba length(stem) x (1+ log(no _ octrences) x 


remaining words ( 1 ) 
x emphasis(u) x social(u) 
emphasis(u) = Sim(u, whole _ document) x Sim(u, predefined _ keywords) (2) 
social(u) = | | (1+log( f(u)) 
allsocial factors f (3 ) 


(quantitative and qualitative) 


4. The assessment of collaboration in a chat conversation 


Being essential to every successful conversation, collaboration in a chat is 
analyzed from multiple perspectives. We take into account social cohesion and 
collaboration, the quantitative mark and gain based collaboration that we will 
explain shortly. 


Based on the previous analysis, we can assume that in any type of conversation an 
essential presumption is generally applicable: the more the speakers have rather 
similar involvement and knowledge relative to the number of individual 
interchangeable utterances and the topics taken into account and analyzed with 
LSA as support, the more efficient the level of collaboration will be, as they will 
all be on the same ground. This aids to the social cohesion and collaboration 
analysis, which is based on social network analysis. 


Since this analysis is however different from standard social network analysis, in 
order to uniformly spread the measure for each individually considered factor, a 
variation coefficient for each metric is necessary. The final results are computed 
as the difference between the initial total and the average value of all partial 
results, obtaining better impact and cohesion with the increase of the collaboration 
between individuals. There are the limitations of the current state of the project, 
but future upgrade will include weighted influences for each factor of the 
collaboration evaluation, depending on the status of the speaker (for example in- 
degree remarks should be more relevant to the current speaker). 


What is by far the most direct approach for evaluation of chat discussions is the 
use of explicit and implicit links. We can determine the degree of collaboration 
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from their number, averaged using a trust coefficient assigned to them. As a 
subject is discussed by multiple users, each reference to previous utterances 
increases the degree of collaboration between speakers, thus proving that the 
analysis of linked utterances from different speakers is efficient. 


Based on this principle, we can compute a quantitative collaboration grade: 


at links 1 trust(1) 


with different speakers (4) 


quantitative collaboration= : : — _ 
total number of links (amplicit/explicit) 


where trust(1) is the assigned trust for an implicit or explicit link (for example, in the 
case of direct repetition this value is set to 1); for all explicit links, trust is set to 1. 


In any conversation, we can build personal and collaborative knowledge 
depending on whether we consider knowledge to be defined through individual 
study and experiences or to have a collective origin. From these perspectives, we 
derived the concept of gain based collaboration [3]. Gain can be either personal, in 
which connected utterances have the same speaker, or collaborative, in which 
information is generated by future involvement in a already existing discussion 
thread. Both individual and collective gains can be easily connected to the 
personal and collective echoes throughout a conversation interweaving inner 
dialogue (based on individual voices) with explicit dialogue (based on inter- 
connected utterances). 


Based on the previous statements, the following formulas are used for evaluating 
gain throughout the conversation: 


gain(u)=personal gain(u)+collaborative gain(u) (5) 
personal gain(u)= > ((mark(v)+gain(v))*similarity(u, v)*trust() 
link 1 between u and v, (6) 
v is an earlier utterance and 
uand v have same speaker 
collaborative gain(u)= De ((mark(v)+gain(v))*similarity(u,v)*trust(1) 
link 1 between u and v, (7) 


y is an earlier utterance and 
uand v have different speakers 


We can thusly identify numerous sets of interconnected elements: personal gain — 
personal knowledge building — individual echoes — inner dialogue and 
collaborative gain - collaborative knowledge building — collective echoes — 
explicit dialogue which are the cornerstone for analyzing collaboration in any 
conversation. 


From the previous formals, we can derive 2 metrics for analyzing collaboration in 
a chat conversation: 
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e Formula (8) is used for estimating the percentage of overall utterances’ 
importance/marks relatively to information built / transferred in a 
collaborative manner: 


Dat! unerances yCOUaborative gain(u) 


mark based collab= (8) 
mark(u) 


all utterances u 


e Formula (9) is used for.assessing collaboration relatively to the overall 
gain (practically excluding inner build): 


Natt utterances u collaborative gain(u) 


gain based collab= ; 
gain(u) 


(9) 


all utterances u 


Based on these factors, we can assess the overall collaboration as a product of the 
previous metrics: 


overall collaboration = social cohesion x quantitative x gain based x mark based (10) 


Collaboration and chat evolution 


Social cohesion and equilibrium: 0.669 Overall collaboration: 0.344 
Quantitative collaboration: 64.3% 
Gain based collaboration: 91.44% 
Mark based collaboration: 0.875 
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Fig. 2. Collaboration assessment and chat evolution visualization. 


5. Conclusions 


Due to the multiple limitations of the NLP paradigm, currently chat and other similar 
conversation types cannot be suitably analyzed in terms of inter-collaboration and 
user participation due to the complex connections that need to be considered. The 
most suitable model for doing such an analysis would be the polyphonic one, which is 
founded on Bakhtin's principles of dialogism and polyphony, as such a model could 
correctly evaluate the degree of collaboration in a conversation. 


58 Mihai Dascalu, Stefan Trausan-Matu, Traian Rebedea, Constantin Daniil 


Our system is based on such a model and analyzes CSCL chats with the purpose of 
accurately grading the involvement and knowledge of users during a conversation. 
Our analysis uses an utterance graph as its main fundament and integrates several 
relevant metrics and technologies enabling a deep understanding of the discussion. 
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